Extraction of Lexico-Semantic Classes from Text
نویسندگان
چکیده
This paper describes an unsupervised method for extracting lexico-semantic classes from POS annotated corpora. The method consists in building bi-dimensional clusters of both words and local syntactic contexts. Each cluster, which represents a lexicosemantic class such as “entities in danger” is the result of merging its most prototypical constituents (words and contexts). The generated clusters will be used as centroids to word classification. The basic intuition underlying our corpus-based approach is that similar classes can be aggregated to generate either more specific or more generic classes, without inducing odd associations between contexts and words. A new class is generated by specification if we make the union of the constituent contexts (intension expansion) while the words are intersected (intension reduction). A new class is generated by abstraction if the local contexts are intersected (intension reduction), while we make the union of the constituent words (extension expansion). Intersecting words and local contexts in an accurate way allows us to generate tight clusters with prototypical constituents.
منابع مشابه
Incorporating Lexico-semantic Heuristics into Coreference Resolution Sieves for Named Entity Recognition at Document-level
This paper explores the incorporation of lexico-semantic heuristics into a deterministic Coreference Resolution (CR) system for classifying named entities at document-level. The highest precise sieves of a CR tool are enriched with both a set of heuristics for merging named entities labeled with different classes and also with some constraints that avoid the incorrect merging of similar mention...
متن کاملVariation and Semantic Relation Interpretation: Linguistic and Processing Issues
Studies in linguistics define lexico-syntactic patterns to characterize the linguistic utterances that can be interpreted with semantic relations. Because patterns are assumed to reflect linguistic regularities that have a stable interpretation, several software implement such patterns to extract semantic relations from text. Nevertheless, a thorough analysis of pattern occurrences in various c...
متن کاملLearning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
A challenging problem in open information extraction and text mining is the learning of the selectional restrictions of semantic relations. We propose a minimally supervised bootstrapping algorithm that uses a single seed and a recursive lexico-syntactic pattern to learn the arguments and the supertypes of a diverse set of semantic relations from the Web. We evaluate the performance of our algo...
متن کاملExtraction of Lexico-Syntactic Information and Acquisition of Causality Schemas for Text Annotation
We present the INSYSE method for the annotation of texts, based on extraction of semantic relations from syntactic structures. We apply this method to a corpus of 5000 Medline abstracts about central nervous system diseases and gene interactions. Our cooperative approach focuses on (1) extracting lexico-syntactic information from sentences in the corpus comprising causation lexemes and (2) elab...
متن کاملUsing Lexico-Syntactic Ontology Design Patterns for Ontology Creation and Population
In this paper we discuss the use of information extraction techniques involving lexico-syntactic patterns to generate ontological information from unstructured text and either create a new ontology from scratch or augment an existing ontology with new entities. We refine the patterns using a term extraction tool and some semantic restrictions derived from WordNet and VerbNet, in order to preven...
متن کامل